109 research outputs found

    Ontobull and BFOConvert: Web-based programs to support automatic ontology conversion

    Get PDF
    When a widely reused ontology appears in a new version which is not compatible with older versions, the ontologies reusing it need to be updated accordingly. Ontobull has been developed to automatically update ontologies with new term IRI(s) and associated metadata to take account of such version changes. To use the Ontobull web interface a user is required to (i) upload one or more ontology OWL source files; (ii) input an ontology term IRI mapping; and (where needed) (iii) provide update settings for ontology headers and XML namespace IDs. Using this information, the backend Ontobull Java program automatically updates the OWL ontology files with desired term IRIs and ontology metadata. The Ontobull subprogram BFOConvert supports the conversion of an ontology that imports a previous version of BFO. A use case is provided to demonstrate the features of Ontobull and BFOConvert

    Rational Vaccine Design by Reverse & Structural Vaccinology and Ontology

    Full text link
    Vaccination is one of the most successful public health interventions in modern medicine. However, it is still challenging to develop effective vaccines against many infectious diseases such as tuberculosis, HIV, and malaria. There are challenges in integrating the high volume, variety, and variability of vaccine-related data and rationally designing effective and safe vaccines efficiently. In my thesis study, I systematically and comprehensively analyzed manually annotated protective vaccine antigens in the Protegen database and identified these protective antigens' enriched patterns. I then created Vaxign-ML, a novel machine learning-based reverse vaccinology method based on the curated Protegen data for rational vaccine design. Vaxign-ML was used to successfully predict vaccine antigens for tuberculosis and Coronavirus Disease 2019 (COVID-19). I also developed a new structural vaccinology design program that optimizes COVID-19 spike glycoprotein as a vaccine candidate for enhanced vaccine protection via T cell epitope engineering. The vaccine antigens selected and optimized by Reverse and Structural Vaccinology in this dissertation are subjected to future experimental verification. Furthermore, I created a community-based Ontology of Host-Pathogen Interactions (OHPI), which served as a platform to semantically represent the interactions between host and virulence factors that are also protective antigens. I developed the Vaccine Investigation Ontology (VIO) for standardized metadata representation for high throughput vaccine OMICS data analysis. Overall, my thesis research aims to uncover protective antigen patterns, create methods/tools to effectively develop vaccines against infectious diseases of public health significance, and strengthen our understanding of vaccine protection mechanisms. These works can be further expanded and integrated with other technologies such as epitope prediction, molecular epidemiology, and high-throughput sequencing to build the foundation of precision vaccinology.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167997/1/edong_1.pd

    COVID-19 Coronavirus Vaccine Design Using Reverse Vaccinology and Machine Learning

    Get PDF
    To ultimately combat the emerging COVID-19 pandemic, it is desired to develop an effective and safe vaccine against this highly contagious disease caused by the SARS-CoV-2 coronavirus. Our literature and clinical trial survey showed that the whole virus, as well as the spike (S) protein, nucleocapsid (N) protein, and membrane (M) protein, have been tested for vaccine development against SARS and MERS. However, these vaccine candidates might lack the induction of complete protection and have safety concerns. We then applied the Vaxign and the newly developed machine learning-based Vaxign-ML reverse vaccinology tools to predict COVID-19 vaccine candidates. Our Vaxign analysis found that the SARS-CoV-2N protein sequence is conserved with SARS-CoV and MERS-CoV but not from the other four human coronaviruses causing mild symptoms. By investigating the entire proteome of SARS-CoV-2, six proteins, including the S protein and five non-structural proteins (nsp3, 3CL-pro, and nsp8-10), were predicted to be adhesins, which are crucial to the viral adhering and host invasion. The S, nsp3, and nsp8 proteins were also predicted by Vaxign-ML to induce high protective antigenicity. Besides the commonly used S protein, the nsp3 protein has not been tested in any coronavirus vaccine studies and was selected for further investigation. The nsp3 was found to be more conserved among SARS-CoV-2, SARS-CoV, and MERS-CoV than among 15 coronaviruses infecting human and other animals. The protein was also predicted to contain promiscuous MHC-I and MHC-II T-cell epitopes, and the predicted linear B-cell epitopes were found to be localized on the surface of the protein. Our predicted vaccine targets have the potential for effective and safe COVID-19 vaccine development. We also propose that an “Sp/Nsp cocktail vaccine” containing a structural protein(s) (Sp) and a non-structural protein(s) (Nsp) would stimulate effective complementary immune responses.http://deepblue.lib.umich.edu/bitstream/2027.42/156072/1/fimmu-11-01581.pdfSEL

    Comparison, alignment, and synchronization of cell line information between CLO and EFO

    Full text link
    Abstract Background The Experimental Factor Ontology (EFO) is an application ontology driven by experimental variables including cell lines to organize and describe the diverse experimental variables and data resided in the EMBL-EBI resources. The Cell Line Ontology (CLO) is an OBO community-based ontology that contains information of immortalized cell lines and relevant experimental components. EFO integrates and extends ontologies from the bio-ontology community to drive a number of practical applications. It is desirable that the community shares design patterns and therefore that EFO reuses the cell line representation from the Cell Line Ontology (CLO). There are, however, challenges to be addressed when developing a common ontology design pattern for representing cell lines in both EFO and CLO. Results In this study, we developed a strategy to compare and map cell line terms between EFO and CLO. We examined Cellosaurus resources for EFO-CLO cross-references. Text labels of cell lines from both ontologies were verified by biological information axiomatized in each source. The study resulted in the identification 873 EFO-CLO aligned and 344 EFO unique immortalized permanent cell lines. All of these cell lines were updated to CLO and the cell line related information was merged. A design pattern that integrates EFO and CLO was also developed. Conclusion Our study compared, aligned, and synchronized the cell line information between CLO and EFO. The final updated CLO will be examined as the candidate ontology to import and replace eligible EFO cell line classes thereby supporting the interoperability in the bio-ontology domain. Our mapping pipeline illustrates the use of ontology in aiding biological data standardization and integration through the biological and semantics content of cell lines.https://deepblue.lib.umich.edu/bitstream/2027.42/140391/1/12859_2017_Article_1979.pd

    The eXtensible ontology development (XOD) principles and tool implementation to support ontology interoperability

    Full text link
    Abstract Ontologies are critical to data/metadata and knowledge standardization, sharing, and analysis. With hundreds of biological and biomedical ontologies developed, it has become critical to ensure ontology interoperability and the usage of interoperable ontologies for standardized data representation and integration. The suite of web-based Ontoanimal tools (e.g., Ontofox, Ontorat, and Ontobee) support different aspects of extensible ontology development. By summarizing the common features of Ontoanimal and other similar tools, we identified and proposed an “eXtensible Ontology Development” (XOD) strategy and its associated four principles. These XOD principles reuse existing terms and semantic relations from reliable ontologies, develop and apply well-established ontology design patterns (ODPs), and involve community efforts to support new ontology development, promoting standardized and interoperable data and knowledge representation and integration. The adoption of the XOD strategy, together with robust XOD tool development, will greatly support ontology interoperability and robust ontology applications to support data to be Findable, Accessible, Interoperable and Reusable (i.e., FAIR).https://deepblue.lib.umich.edu/bitstream/2027.42/140740/1/13326_2017_Article_169.pd

    CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis

    Get PDF
    Ontologies, as the term is used in informatics, are structured vocabularies comprised of human- and computer-interpretable terms and relations that represent entities and relationships. Within informatics fields, ontologies play an important role in knowledge and data standardization, representation, integra- tion, sharing and analysis. They have also become a foundation of artificial intelligence (AI) research. In what follows, we outline the Coronavirus Infectious Disease Ontology (CIDO), which covers multiple areas in the domain of coronavirus diseases, including etiology, transmission, epidemiology, pathogenesis, diagnosis, prevention, and treatment. We emphasize CIDO development relevant to COVID-19

    Ontological representation, integration, and analysis of LINCS cell line cells and their cellular responses

    Full text link
    Abstract Background Aiming to understand cellular responses to different perturbations, the NIH Common Fund Library of Integrated Network-based Cellular Signatures (LINCS) program involves many institutes and laboratories working on over a thousand cell lines. The community-based Cell Line Ontology (CLO) is selected as the default ontology for LINCS cell line representation and integration. Results CLO has consistently represented all 1097 LINCS cell lines and included information extracted from the LINCS Data Portal and ChEMBL. Using MCF 10A cell line cells as an example, we demonstrated how to ontologically model LINCS cellular signatures such as their non-tumorigenic epithelial cell type, three-dimensional growth, latrunculin-A-induced actin depolymerization and apoptosis, and cell line transfection. A CLO subset view of LINCS cell lines, named LINCS-CLOview, was generated to support systematic LINCS cell line analysis and queries. In summary, LINCS cell lines are currently associated with 43 cell types, 131 tissues and organs, and 121 cancer types. The LINCS-CLO view information can be queried using SPARQL scripts. Conclusions CLO was used to support ontological representation, integration, and analysis of over a thousand LINCS cell line cells and their cellular responses.https://deepblue.lib.umich.edu/bitstream/2027.42/140390/1/12859_2017_Article_1981.pd

    Systems consequences of amplicon formation in human breast cancer

    Get PDF
    Chromosomal structural variations play an important role in determining the transcriptional landscape of human breast cancers. To assess the nature of these structural variations, we analyzed eight breast tumor samples with a focus on regions of gene amplification using mate-pair sequencing of long-insert genomic DNA with matched transcriptome profiling. We found that tandem duplications appear to be early events in tumor evolution, especially in the genesis of amplicons. In a detailed reconstruction of events on chromosome 17, we found large unpaired inversions and deletions connect a tandemly duplicated ERBB2 with neighboring 17q21.3 amplicons while simultaneously deleting the intervening BRCA1 tumor suppressor locus. This series of events appeared to be unusually common when examined in larger genomic data sets of breast cancers albeit using approaches with lesser resolution. Using siRNAs in breast cancer cell lines, we showed that the 17q21.3 amplicon harbored a significant number of weak oncogenes that appeared consistently coamplified in primary tumors. Down-regulation of BRCA1 expression augmented the cell proliferation in ERBB2-transfected human normal mammary epithelial cells. Coamplification of other functionally tested oncogenic elements in other breast tumors examined, such as RIPK2 and MYC on chromosome 8, also parallel these findings. Our analyses suggest that structural variations efficiently orchestrate the gain and loss of cancer gene cassettes that engage many oncogenic pathways simultaneously and that such oncogenic cassettes are favored during the evolution of a cancer.Singapore. Agency for Science, Technology and ResearchNational Science Foundation (U.S.) (East Asia and Pacific Summer Institutes (OISE-1108282)

    Multi-species network inference improves gene regulatory network reconstruction for early embryonic development in Drosophila

    Get PDF
    Gene regulatory network inference uses genome-wide transcriptome measurements in response to genetic, environmental or dynamic perturbations to predict causal regulatory influences between genes. We hypothesized that evolution also acts as a suitable network perturbation and that integration of data from multiple closely related species can lead to improved reconstruction of gene regulatory networks. To test this hypothesis, we predicted networks from temporal gene expression data for 3,610 genes measured during early embryonic development in six Drosophila species and compared predicted networks to gold standard networks of ChIP-chip and ChIP-seq interactions for developmental transcription factors in five species. We found that (i) the performance of single-species networks was independent of the species where the gold standard was measured; (ii) differences between predicted networks reflected the known phylogeny and differences in biology between the species; (iii) an integrative consensus network which minimized the total number of edge gains and losses with respect to all single-species networks performed better than any individual network. Our results show that in an evolutionarily conserved system, integration of data from comparable experiments in multiple species improves the inference of gene regulatory networks. They provide a basis for future studies on the numerous multi-species gene expression datasets for other biological processes available in the literature.Comment: 10 pages text + 3 figures + 1 table + 2 supplementary figures + 3 supplementary table
    • 

    corecore